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Abstract 

We address the relation between long range correlations and charge transfer 

efficiency in aperiodic artificial or genomic DNA sequences. Coherent charge 

transfer through the HOMO states of the guanine nucleotide is studied using 

the transmission approach, and focus is made on how the sequence-dependent 

backscattering profile can be inferred from correlations between base pairs. 
PACS numbers: 87.14.Gg, 72.20.Ee, 72.80.Le 
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During the past few years, the nature of long range correlations in DNA sequences has 
been the subject of intense debate [1-3]. Scale invariant properties in complex genomic 
sequences with thousands of nucleotides have been investigated in particular with wavelet 
analysis [2] , and have been argued to play crucial role in gene regulation and cell division. Be- 
sides, amongst the many physical, chemical or biological phenomena that might be inferred 
from sequence correlations, charge transfer properties deserve particular concern. Indeed, a 
precise understanding of DNA-mediated charge migration would have strong impact on the 
description of damage recognition process and protein binding, or in engineering biological 
processes [4,5]. The 7r-stacked array of DNA base pairs (bp) (made up from nucleotides: 
guanine g, adenine a, cytosine c, thymine t) provides an extended path to convey long range 
charge transport although dynamical motions of base pairs, or energetic sequence dependent 
heterogeneities, are expected to reduce long range efficiency Photoexcitation experiments 
have unveiled that charge excitations can be transmitted between metallointercalators, pref- 
erentially through the guanine highest occupied molecular orbitals (g-HOMO) of the DNA 
bridge [5,6]. Such experiments and mesoscopic transport measurements on single artificial 
or genomic DNA sequences contacted in between metallic electrodes have also been the 
subject of intense and controversial debate [7]. While accurate determination of absolute 
values of conductivity is important, characteristic sequence dependences of charge transport 
could provide valuable clues to mechanisms and biological functions of transport. Such issue 
has been up to now poorly addressed experimentally and theoretically. In that perspective, 
the possible role of long range correlations on electronic derealization has been recently 
anticipated [8]. In this Letter, the electronic transport properties are proven to be critically 
related to the nature and range of correlations. 

Rescaling coefficients have been introduced as a useful measure of correlations in DNA 
sequences [1] . It relies on the evaluation of the second moment of the fluctuations of sequence 
composition. The statistical method consists on constructing a mapping of the nucleotide 
sequence onto a walk. A DNA walk is initiated from the first to the last nucleotide of the 
sequence with the rule that the walker steps down [v(i) = — 1] if a purine (a,g) occurs at 
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position i along the sequence, whereas the walker steps up [v(i) = +1] if a pyrimidine (t , c) 
occurs at position i. Given a nucleotide sequence of size N, the net displacement x(n) of 
the nucleotide walker after n steps is, x(n) = J27=i v (^) i 1 < n < N . Recently, Hurst's 
analysis [9] was argued to be more reliable for determining the precise rescaling coefficients 
[10]. We thus follow the prescription of Hurst's analysis to construct adjusted variables as 
X(m, k) = Ax(m, k) — ^ Ax(m, n) ; 1 < k < n and define the range S(m, n) for random 
walks of lengths n as S(m, n) = maxi<£< n [X(m, k)] — mini<fc< n [X(m, k)] . Now, the rescaled 
range function R{n) is defined as [9], 

R{n) = i?Mt oc n H (1) 
a{n) 

where (S(n)) = S(m, n) / (N—n) and a 2 (n) is the standard deviation of v(i) over walks 

of lengths n, and averaged over the entire sequence. The Hurst exponent H of the process is 
then defined through the scaling in Eq.(l). Interestingly, for short-ranged correlated random 



walk the exact result for the rescaled range function reads, R{n) = ^[nn/2] — 1 [9,11]. Thus, 
H = 1/2 for the ordinary Brownian motion. The existence of power-law behaviors suggests 
that there is no characteristic length scale associated with properties under consideration. 
It is clear at the first glance that DNA sequences are unlikely fully characterized by a single 
scaling exponent. One expects that the scaling behavior be different for different length 
scales of the sequence, i.e, the rescaling exponent is itself a function of the length scale n. In 
the case where a characteristic size n c can be defined, one may postulate that R(n) is still 
described by the power-law in Eq.(l), but with a scale dependent rescaling exponents H(n) 
such that H{n) = Hi for 1 < n < n c and H{n) = H 2 for n > n c . 

In our study, we consider three sequences: a DNA sequence of the first completely 
sequenced human chromosome 22 (Ch22) containing about 33.4 x 10 6 nucleotides enti- 
tled NT011520 retrieved from the National Center for Biothechnology Information (NCBI), 
a Random DNA sequence (where a,c,t,g are evenly chosen probability 1/4) and a 
Fibonacci Polygc quasiperiodic sequence constructed starting from a (^-nucleotide as 
seed and following the inflation rule g — > gc and c — > g. This gives successively 



9,gc,gcg,gcggc,gcggcgcg,gcggcgcggcggc,---, for sequences of length 1, 2, 3, 5, 8, 13, 
• • •, respectively, such that its characteristic self-similar order introduces correlations on 
broad scale range. The ratio [number of g]/ [number of c] approaches the golden mean value 
(1 + v / 5)/2 ~ 1.618 in the limit of an infinite sequence. The Random and Fibonacci se- 
quences are used as prototypes of short-range (or uncorrelated) and strongly correlated 
systems, respectively. 

The computed functions R(n) for the three sequences described above are reported on 
Fig. 1 and values of H are summarized in Table I. It clearly appears from these calculations 
that the Random sequence is indeed uncorrelated following the ^/[7rn/2]-law, whereas Fi- 
bonacci sequence is strongly correlated with a "ballistic behavior" and correlations in Ch22 
sequence exhibit a power-law behavior with a scaling exponent depending on the length 
scale. The Ch22 sequence has long-range correlations characterized by Hurst exponents 
greater than 1/2 (see Table I). Given the huge amount of nucleotides of the Ch22 sequence, 
the physically relevant question seems rather to address to which extent charge transport can 
be efficient through the g-HOMO, in comparison with uncorrelated random or quasiperiodic 
sequences. To have some elements of response, we now turn to the examination of charge 
transfer properties in these sequences. To this end, we consider an effective tight-binding 
Hamiltonian describing the energetics of a hole located at nucleotide site n [13,14], 

W = J2 £ nCic n - J2 ^o(4 C n+l + h.C.) (2) 

n n 

where (c n ) is the creation (annihilation) operator of a hole at site n. The hole site energies 
e n are chosen according to the ionization potentials of respective bases [14], e a = 8.24eV, 
s t = 9.14eV, e c = 8.87eV, and e g = 7.75eV, while the hopping integral, simulating the 
7r — 7r-stacking between adjacent nucleotides, is taken as t = leV. The DNA sequences 
are further assumed to be connected to two semi-infinite electrodes whose energies e m are 
adjusted to simulate a resonance with the g-HOMO energy level, e m = e g , and with hopping 
integrals such that t rn = t . Note that ab-initio studies suggest that t ~ 0.1 — OAeV 
[14], but the choice t m /to = 1 reduces backscattering of holes at the contact electrodes 
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and allows for a larger accessible transmission spectrum and a better characterization of 
DNA's intrinsic conduction [13]. Sites comprised between [—00, 0] U [N + 1, +00] belong to 
the leads, whereas sites i — 1, N are associated to the sequence of size N under study. The 
transmission coefficients are computed using the transfer matrix formalism in which the time 
independent Schrodinger equation is projected into a localized basis by properly accounting 
for the boundary conditions [15]. Let ip n denotes the wavefunction with energy E at site n, we 



obtain from Eq.(2) the recurrent equation, 
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where M n is a 2 x 2 matrix with elements M n (l, 1) = (E — e n )/t n+1 , M n (l,2) = —t n /t n+1 , 
M n (2, 1) = 1 and M n (2,2) = 0. The transmission coefficient T N (E) ) that gives the fraction 
of tunneling electrons transmitted through the N-site DNA, is related to the Landauer 
resistance as (h/2e 2 )[l — T N (E)]/T N (E) ) where h/2e 2 is the quantum resistance and [15], 
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(3) 



with V = M N M N ^ 1 ....M 1 . For a given energy, T N (E) reflects the level of backscattering 
events in the hole transport through the sequence. As metallic leads are adjusted to the g- 
HOMO energy level, the hole transport will experience a sequence dependent contribution of 
backscattering according to the distribution of c, t, and a potential barriers over the length 
scale of the sequence. To compare transmission properties of different chains, the behavior 
of the Lyapunov coefficient, ^n(E) = ^ln(T N (E)), is also calculated. Jn(E) has been 
extensively investigated to sort out the main features of complex localization patterns [16,17]. 
For systems with uncorrelated disorder, ^n(E) provides the localization length £(E) = 
1/flimjv^oo 1n(E)]. In presence of scale invariance properties, the underlying structure of 
7at(.E) reflects the self-similarity of the spectrum [17]. 

Following our analysis on correlations, the T N (E) for the three sequences of Table I have 
been computed, varying the sequence length. The random and Fibonacci quasiperiodic 
based sequences are generated starting from the first nucleotide of the sequence up to iV bp, 



while the Ch22-based sequences are constructed by starting from the bp=15000 of the full 
Ch22 sequence and then extracting the first N bp, namely ag ggcatcgctaacgag gtcgccgtccaca 
gcatcgctatcgag gacaccacaccgtcca for N = 60 bp. Figures 2 and 3 present the comparison 
of T/v (E) between the quasiperiodic and Ch22 sequences and between uncorrelated random 
DNA and Ch22 sequences, respectively, with the same number of bp. Lyapunov coefficients 
for quasiperiodic and Ch22-based sequences are also displayed in Fig. 4. 

General trends of Figs. 2 and 3 are that T N (E) is characterized by an energy spectrum 
of resonant peaks with high transmission. As the sequence length increases, much less 
states will present good transmittivity, due to the progressive fragmentation of the spectrum, 
although several peaks with high transmission remain at certain energy values, and new ones 
may appear. For Fibonacci and Ch22-based sequences, these resonant energies are robust 
enough to persist against backscattering effects due to interspersed bases along the sequence. 
This point is illustrated in Fig.2 and Fig.3 where one observes that Fibonacci (resp. Ch22- 
based sequences) of 180 bp (resp. 360 bp) exhibit states with better transmission properties 
than those present in a 60 bp (resp. 300bp) long sequence. In addition, Jn(E) shown in 
Fig. 4 illustrates intrinsic properties of the two correlated sequences albeit of different nature. 
Indeed, the series of main elliptic bumps found in the Fibonacci sequence with 60 bp are 
reproduced in the 480 bp sequence, which present additional features associated with the 
partitioning of spectrum. While self-similarity fully characterizes the quasiperiodic sequence, 
the scaling properties in Ch22 rely on totally different kind of long range correlations, with 
no hints of self-similar patterns. 

In contrast, the fragmentation of the spectrum strongly affects the transmittivity of the 
uncorrelated random sequence. All resonant states (when any) are evenly affected and the 
corresponding transmission decreases as the sequence length gets longer. From a statistical 
analysis over many random sequences, it clearly appears that Ch22-based sequences ex- 
hibit much higher charge transfer efficiency over much longer distances in comparison with 
uncorrelated random sequences. 

Nevertheless, to improve our understanding and gain some physical insights about char- 
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acteristic features exhibited by these sequences, we now focus on quasiperiodic sequences 
since it has been shown that the global structure of the electronic spectrum of such chains 
can be obtained in practice by considering very short periodic approximants to infinite 
quasiperiodic chains [17]. These sequences are characterized by long range correlations that 
manifest themselves on electronic properties in terms of power-law localization of eigen- 
states in the thermodynamic limit or power-law increase of Landauer resistance in finite 
samples [17]. For this purpose, we consider a periodic approximant whose unit cell is 
gcggc. The corresponding dispersion relation of this approximant is given by, 2^cos(5g) = 
(E - e g f (E - e c ) 2 -tl (E - e g ) (E - e c ) (5E - 4e g - e c )+4 ( 5E ~ - 2e c)- The energy spec- 
trum of the gcggc approximant is composed of three broad bands (of bandwidth ~ 0.5 — 0.6 
eV) centered at the energies E 2 = 6. 915 eV, E 3 = 8.143 eV and E 4 = 9. 527 eV, plus two 
narrower bands (of bandwidth ~ 0.25 eV) located at the edges of the spectrum at E\ = 6.191 
eV and E 5 = 10.213 eV. These analytical results allow us to properly assign the different 
resonant peaks appearing in the spectrum of the transmission coefficient (shown in the inset 
in Fig. 2) in respect to the four main sub-bands of the spectral window [5.75, 9.75 eV]. 
States belonging to the broader central bands around E 2 = 6.915 eV and E 3 = 8.143 eV 
turn out to be very robust to the progressive fragmentation of the energy spectrum. Ac- 
cordingly, one is tempted to conclude from the simple inspection of Fig. 2 (left frames) that 
these states should exhibit good transport properties even in the thermodynamic limit. To 
further substantiate such an assertion, we consider in addition the transmission coefficient 
corresponding to the gcggc approximant, 

-l 



T N (E) 



l + q(x,y)UN_,{w) 

r. - 1 - 



(4) 



where x — (E - e c ) /2t , y = (E — e g ) /2t , w = 16x 2 y 3 - 16xy 2 - Ayx 2 + 3y + 2x the U n -\ (w) 
is a Chebyshev polynomial of the second kind, and q(x, y) = A 2 / (1 — y 2 ) + B 2 — 1 with A = 
-2Axy 3 -lQx 2 y 2 +Qxy+2x 2 + 32x 2 y A +Ay' l +y 2 and B = 32x 2 y 3 -8x 2 y-24xy 2 +4y 3 +3y+2x. 
The resonance condition then reads, q(x,y)U 2 t _,(w) = 0, while the condition q(x,y) = 
yields Ei = 4. 317 eV (which does not belong to the spectrum) and E u = 10. 158 eV (located 



near the center of the uppermost band, which is not included in our spectral window). On 
the other hand, the roots of the Chebyshev polynomial label a full transmission peak series 
according to the relationship w = cos(5kir/N) with k — 0, ...,N. This is illustrated in the 
inset of Fig. 2 (top-left) where one observes oscillations in the energy dependence of the 
transmission curve for a sequence cgccg with 10 units. By a deeper analysis, we find that 
Fibonacci quasiperiodic sequences as long as 160 nm i.e., ~ 450 bp will still allow for nearly 
resonant transmission around two specific energies E 2 ~ 6.9 eV and E 3 ~ 8.1 eV. 

In summary when compared with uncorrelated sequences, long range correlations in 
aperiodic DNA sequences seem to induce coherent charge transfer over longer length scales. 
Such feature has been illustrated in particular in Chromosome 22-based sequences. Given 
that the nature of long range correlations differs in coding versus non-coding regions of ge- 
nomic DNA [3], one should further elaborate on a more systematic study of charge transport 
in genomic DNA. 
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TABLES 



Sequence 


N 


Purines 


H(n c = 300) 










Hi 


H 2 


Ch22 


182617 


91029 


0.60 


0.75 


Random 


182617 


91118 


0.50 


0.50 


Fibonacci 


46368 


28657 


0.085 


0.011 




TABLE I. Hurst exponents 


calculated from data 


in Fig. 1. 
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FIGURES 



I I I I 1 1 1 1 1 — I I I I 1 1 1 1 1 — I I I I 1 1 1 1 1 — I I I I 1 1 




n 

FIG. 1. Rescaled range function R(n) versus n. Dashed line corresponds to ^J\ktiJ%— 1. 
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FIG. 2. Transmission coefficient for Fibonacci Polygc quasiperiodic (left frames) and 
Ch22-based sequences (right frames). Inset: T^(E) in Eq.(4) for a periodic approximant of length 
TV = 50 bp. 
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FIG. 3. Transmission coefficient for Ch22-based sequences (main frames) and typical results 
(over about 50 sequences) for uncorrelated DNA random chains (insets) with same number of 
nucleotides. 
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FIG. 4. Lyapunov coefficient for Ch22-based (main frame) and Fibonacci Polygc quasiperiodic 



sequences (inset). 
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